How does a country's economic status impact it's football performance in the international stage?

By Nitan Singh, Vladamir Rife, Ehaab Basil

SETUP AND DATA WRANGLING

Football (soccer) is the most popular sport on Earth, and as such, many countries at many different economic levels compete to be the best. However, some countries are obviously better than others in the sport. This project aims to see if the economy and stability of a country correlates signifigantly with the performance of their national team.

Our hypothesis is that countries with a greater economic performance tend to have better national teams than those that do not. In order to test this, we will use machine learning to analyze each nation's GDP (Gross Domestic Product), GDP-per-captia, and PPP (Purchasing Power Parity), compared to their total FIFA points as the dependent variable. We will only be analyzing this for the year of 2016, as that is the latest year across all of the datasets.

Setup

Below are the packages and libraries we will use for working with our data.

Data Collection and Curation

The datasets we will use for this project include one table with each country's GDP, a second table with each country's population, a third table with each country's PPP, and a fourth table with each country's fragility index all acting as independent varaibles. and a final table holding each country's FIFA rankings and total points as our dependent variable. The links to each data source are provided below:

GDP: https://github.com/datasets/gdp/blob/master/data/gdp.csv

Population: https://github.com/datasets/population/blob/master/data/population.csv

PPP: https://github.com/datasets/ppp/blob/master/data/ppp-gdp.csv

FIFA ranking: https://github.com/cnc8/fifa-world-ranking/blob/master/fifa_ranking-2020-12-10.csv

Why FIFA Total Points?

We chose the total points listed in the FIFA rankings database to be our measure of each country's performance in football since it is the most widely used metric for ranking national teams in men's soccer. The basic algorithm for computing the total points that produce these rankings is as follows:

The country's total points = (Their previous 12 month averaged match points + Their 36 month averaged match points before that)

Furthermore, the point system for each match is as follows:

Points from a single match = M x I x T x C

M = 3, 1, 0, 1, or 2 depending on if the country won, drew, lost, lost on penalties, or won on penalties respectively

I = 1, 2.5, 3, or 4 depending on if the match was a friendly, world-cup/confederation-level qualifier, confederation-level final, or world cup final game respectively

T = (200 - the opposing team's FIFA rank)

C = 1, 0.99, or 0.85 depending on if the opposing team is in the CONMEBOL, UEFA, or AFC/CAF/OFC/CONCACAF confederations respectively

More information about this algorithm exists at: https://www.goal.com/en-us/news/fifa-world-ranking-how-it-is-calculated-what-it-is-used-for/16w60sntgv7x61a6q08b7ooi0p

Why GDP?

We chose GDP (Gross Domestic Product) as a metric for a country's wealth since it is a direct measurement of a country's monetary productivity and consumption. The basic algorithm for computing GDP is as follows:

The country's GDP = C + G + I + NX

C = total domestic consumer spending or private consumption expenditure

G = total government consumption expenditure and gross investment carried out under the government’s name

I = all capital expenditures or private domestic investments

NX = (country's total export costs - country's total import costs)

More information about this algorithm exists at: https://www.analyticssteps.com/blogs/introduction-gross-domestic-product-gdp

Why GDP Per Capita?

We chose GDP per capita as a metric for a country's wealth since it is a measurement of a country's monetary productivity and consumption as opposed to its population. The basic algorithm for computing GDP per capita is as follows:

The country's GDP Per Capita = (The country's GDP / The country's population)

Why PPP?

PPP (Purchasing Power Parity) is the measure of how accessible and affordable goods are in a country with one standard unit of their currency (Ex: USD, Euro, Russian Ruble, etc). We chose this as a metric for a country's wealth since it is a good measure of how financially comfortable it is for citizens to exist in their country of origin. PPP can be calculated using the following formula:

The country's PPP = Cost of goods in the country / Cost of goods in another country (usually the US)

More information about this algorithm exists at: https://www.ig.com/en/trading-strategies/what-is-purchasing-power-parity--ppp---191106

Read the data from each of the github tables into a dataframe. We are only interested in the rows concerning individual countries, so we will delete any row with information about a country grouping (Ex: "Arab World"). We also need to drop any unecessary columns or rename any columns that might conflict with the other tables we will merge that table with.

MACRO INDICATORS vs WORLD

To get a general idea as to if there is a correlation between the economic factors we chose and the FIFA standings of the countries of the world, we will be doing a linear regression between each country's GDP, GDP-per-capita, and PPP with their total FIFA points to examine the resulting p-value. If the p-value is <=.05, there is a signficant corrrelation, and if not then there is little correlation. The resulting line and scatter plot will also be plotted to display these results visually.

According to the graphs and their associated p values, GDP and GDP per capita have a strong correlation with points for the entire world, and PPP has an insignificant correlation. However, individual countries are hard to see at this scale, so zooming in and seperating based on confederation will allow for better a visualization and analysis.

MACRO INDICATORS vs CONFEDERATIONS

To get a better idea as to if there is a correlation between the GDP, GDP-per-capita, and PPP of a country and it's FIFA standing, we will examine the same linear regressions as before, but specific to each confederation. This way, the regional factors/circumstances affecting each country

If the p-value is <=.05, there is a signficant corrrelation, and if not then there is little correlation. The resulting line and scatter plot will also be plotted to display these results visually.

The scatter plots above plot GDP vs points for all countries, with plots seperated by their confederation. According to the p values, there is a significant correlation between GDP and points for the AFC and UEFA confederations, with all other confederations having an insignificant correlation.

The scatter plots above plot GDP per capita vs points for all countries, with plots seperated by their confederation. According to the p values, there is a significant correlation between GDP per capita and points for the AFC and CONMEBOL confederations, with all other confederations having an insignificant correlation.

The scatter plots above plot PPP vs points for all countries, with plots seperated by their confederation. According to the p values, there is a insignificant correlation between PPP and points for all confederations

To synthesize the above findings; GDP and GDP per capita had strong correlations with points for AFC

GDP had strong correlations with points for UEFA

Only GDP per capita had a strong correlation with points for CONMEBOL

PPP was insignificant for all confederations

External Factors?

Before plotting the individual confederations, the GDP and GDP-per-capita plot of all the countries might have led us to believe countries with higher values in both tend to have better performing national teams. However, now that we zoom in to the individual confederations, we see OFC, CONMEBOL, CONCACAF, and CAF do not have a low enough p-value to conclusively correlate with GDP and neither do OFC, CONCACAF, CAF, and UEFA correlate conclusively enough with GDP-per-capita. Thus, there must be another more signifigant factor at play we can search for.

To this end, we will examine the stability of each country. Our hypothesis is that, the more stable the livelihoods of the citizens of a country are, the more time some citizens might have to enjoy and develop in recreational activities like soccer. The best metric to measure this stability would be each country's fragility index.

Fragility Index

Why fragility index?

A country's Fragility Index is a measure aimed to assess their vulnerability to conflict or collapse based off certain factors (Brain drain, External intervention, etc). We chose this as a metric for a country's stability since, the higher the score, the less stable a country will be. A country's fragility index is calculated by a conflict assessment framework called CAST (Conflict Assessment System Tool) and detailed in the documentation in the following source: https://fragilestatesindex.org/indicators/

Download the Fragility index excel file from here, open in any spreadsheet program, and click save as csv with the filename 'country_stability_2016.csv': https://fragilestatesindex.org/excel/

Make a scatter plot of each confederation's fragility score as opposed to their total FIFA points. Invert the fragility score axis so that lesser fragility indicies (higher stability) are shown as possitive ascension and plot a linear regression to show how the data trends. Likewise, print the resulting p-values of the regression for each plot to show if the countries' fragility index correlates with their total FIFA points within that confederation.

The scatter plots above plot fragility index vs points for all countries, with plots seperated by their confederation. According to the p values, there is a significant correlation between the fragility index and points for the AFC, UEFA, CAF, CONCACAF, and OFC confederations, with only the CONMEBOL confederation having an insignificant correlation.

The above heatmap visualizes the correelations between total poiints and fragility for all teams, with strong correlations having dark or light colorations, depending on the direction of the correlation. This verifies our observation made previous, that the fragility index has the most signficant correlation to total points of a soccer team, because fragility has the highest absolute value of all the other indicators when comparing with total points.

Our initial hypothesis has been almost completley rebuked, as GPD, GDP per capita, and PPP have proven to be largely insignificant. However, fragility index appears to have some signficance, so it may be worthwhile to investigate this aspect further. The fragility index is an index, meaning it is comprised of multiple other micro-indicators, so by comparing these micro-indicators with total points, we could isolate the specific indicators that yield the best correlation with total points.

To quickly visualize the correlations between all of the micro-indicators of the Fragility index with total points, we will utilize a heatmap.

This heatmap shows the 2 indicators that best correlate with total FIFA ranking points is X1: External Intervention and E3: Human flight and Brain Drain. Both have a coefficient of -0.48, which when calculating as absolute value is .03 higher than the baseline coefficient of -0.45. Therefore, these 2 indicators should result in a better correlation with total points than just using fragility index.

The p values above all confirm what we found in the heatmap; X1: External Intervention and E3: Human flight and Brain Drain have the strongest correlation with total points. However, E3: Human flight and Brain Drain has a lower p-value and therefore stronger correlation than X1: External Intervention, therefore E3 should be the canidate for further examination to see if it can be a better indicator than Fragility index itself.

BRAIN DRAIN

"E3: Human flight and Brain Drain" is defined as the emigration of the economically productive populations of a country due to economic deterioration in their home country and the hope of better opportunities afar. Merge this information into the table using the same function as before and display the head of the table to show the change.

We will analyze the Brain Drain indicator as we have analyzed our previous indicators, with a linear regression seperated for all confederations, so our analysis is methodically consistent and we are not introducing any new regressional skews.

According to the above graphs and their p-values, Brain Drain the greatest signficance to the countries of all confederations except OFC. And since OFC consists of fewer representative nations as CONMEBOL, it would be considered a better overall indicator that the correlation increases for all other confederations despite the drawback in this confederation.

Comparing our p values shows that Brain Drain has increased accuracy for the UEFA, CONCACAF, and CONMEBOL, and has decreased accuracy for AFC and OFC. The decrease in accuracy for AFC and increase for UEFA, CAF, and CONCACAF is negligable, because all of their p values are very close to 0 regardless. CONMEBOL has become signficantly more accurate, as the p value has went from 0.138098, an insignificant value, to 0.014485, a signficant value. Contrastingly, the p values for OFC went from 0.022709 to 0.325008, losing its significance.

Conclusions

Takeaways:

Future Possibilities

The analytic method we used to quantify the signficance of an indicator was rather simple; Isolate the variable to its own column, compare the variable against the potential dependent variable via a linear regression, and check the significance with p-value. We were consistent with our methodolgy, which limits the extent of unnecassary skews due to a change in analytic approach. And even though we didnt find a definitive set of indicators that could correlate to a country's football performance, we havent analyzed everything. There are many other factors, from climate conditions to government spending breakdowns that could be attributed to football performance. Additionally, there are many other models that could yield better predictions than a linear regression, such as a polynomial regression, SVM, or even a neural network. As a final takeaway, this tutorial is a mere stepping stone to the wide range of analytical approaches one could take to explain what aspects of a country most influences its professional football performace.